perm filename CLNTLN.MSG[COM,LSP]8 blob sn#853870 filedate 1988-02-29 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00001 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 ENDMK
C⊗;
∂17-Dec-87  1712	CL-Characters-mailer 	test    
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 17 Dec 87  17:12:33 PST
Date: Thu, 17 Dec 87 11:35:57 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871217.113557.baggins@IBM.com>
Subject: test

  test of new router name

∂17-Dec-87  1809	CL-Characters-mailer 	mailbox name change, JEIDA interaction,  sub-topics  
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 17 Dec 87  18:09:01 PST
Date: Thu, 17 Dec 87 17:46:49 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871217.174649.baggins@IBM.com>
Subject: mailbox name change, JEIDA interaction,  sub-topics
Subject: new mailbox router is now operational

As evidenced by the rejected message below,
cl-natural-languages is no more.  please use cl-characters.

Regards,
  Thom

------------------------------------------------------------


Date: 17 Dec 87 10:59:48
From: Mailer-Daemon at IBM.COM
To: BAGGINS

IBM.COM Mail Server unable to deliver the following mail to:cl-natural-languages
Reason:
Negative reply from Host:sail.stanford.edu
550 I don't know anybody named cl-natural-languages

           ** Text of Mail follows **
Date: Thu, 17 Dec 87 10:31:39 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee"
    <cl-natural-languages@sail.stanford.edu>
Message-ID: <871217.103139.baggins@IBM.com>
Subject: mailbox name change, JEIDA interaction,  sub-topics

  Sometime soon, our router at stanford will change to cl-characters.
I'll broadcast as soon as I determine it is operational.

  My counterpart at the IBM Tokyo Research Lab, presented the IBM
character extensions proposal at a JEIDA meeting in Nov.  JEIDA knows
that this has not yet been discussed by our ANSI committee.

  Per our discussion at the Ft Collins meeting, I am inviting ISO&JEIDA
to join our conferencing (via the stanford router as soon as the
new name is in effect).

  Larry made the reasonable suggestion that we decide
on the sub-topics of the proposals and deal with each (initially)
somewhat independently.

  Hopefully, everyone has a copy of the proposal material by now!
Let me know if not and I will ship a copy asap.

  My stab at sub-topics is:

     Type hierarchy
        eg. thin-string

     Explicit character set manipulation
        eg. define-char-set

     Equivalence
        eg. define-equivalence-class

     I/O interface
        eg. print-width

     Character set (or subset) predicates
        eg. jcl:jis-char-p


  ?other suggestions?





Happy Holidays,
  Thom

∂21-Dec-87  1918	CL-Characters-mailer 	Network communications 
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 21 Dec 87  10:59:00 PST
Date: Mon, 21 Dec 87 10:13:40 PST
From: Thom Linden <baggins@ibm.com>
To: "Dr. Takayasu Ito" <tito%aoba.aoba.tohoku.junet@relay.cs.net>,
    "Dr. Taiichi Yuasa" <yuasa%kurims.kurims.kyoto-u.junet@relay.cs.net>
cc: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <871221.101340.baggins@IBM.com>
Subject: Network communications

  The ANSI subcommittee handling character issues communicates
over the networks via a broadcast node (cl-characters) at Stanford.
You and/or the interested members of your committees are encouraged
to participate in these conversations.  If you inform me of the
appropriate net ids, I will have them added to the distribution
list.

Regards,
  Thom Linden

∂22-Dec-87  0600	CL-Characters-mailer 	Type hierarchy    
Received: from XEROX.COM by SAIL.STANFORD.EDU with TCP; 22 Dec 87  06:00:07 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 22 DEC 87 06:01:12 PST
Date: 22 Dec 87 05:59 PST
From: Masinter.pa@Xerox.COM
Subject: Type hierarchy
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Thu, 17 Dec 87 17:46:49
 PST
To: cl-characters@sail.stanford.edu
Message-ID: <871222-060112-6764@Xerox>

I've spent some time thinking about this:

I think it is a fundamental error, an unacceptable incompatible change, to
change the Common Lisp type STRING to be something other than (VECTOR
STRING-CHAR), as is suggested by all of the extant proposals.

I think one of our fundamental design goals is that the extended language
features being proposed be in fact extensions, in that current CL functions not
be in error.

Currently, you can assume after (TYPEP x 'STRING) that X can hold any
STRING-CHAR element. Allowing STRING to denote several different types of vector
whose element types are < STRING-CHAR would violate that assumption.

It isn't necessary to change STRING in an incompatible way, however. What is
really the intent of these proposals is to extend the various functions in CL
that currently take "STRING" to also allow them to take other types as well.

Suppose we define a new type

(defun character-vector-p (x) 
   (and (vectorp x) (subtypep (array-element-type x) 'string-char)))

(deftype character-vector () '(satisfies character-vector-p))..

Now extend all functions that take strings as input arguments and have them
accept any kind of character-vector. 

∂29-Dec-87  1449	CL-Characters-mailer 	Type hierarchy    
Received: from SCRC-RIVERSIDE.ARPA by SAIL.STANFORD.EDU with TCP; 29 Dec 87  14:49:26 PST
Received: from LM1.NSC.DIALNET.SYMBOLICS.COM by Riverside.SCRC.Symbolics.COM via DIAL with SMTP id 214218; 29 Dec 87 12:52:39 EST
Received: from LM2.NSC.Dialnet.Symbolics.COM by LM1.NSC.Dialnet.Symbolics.COM via CHAOS with CHAOS-MAIL id 19079; Tue 29-Dec-87 23:37:20 JST
Date: Tue, 29 Dec 87 23:37 JST
From: Carl Hoffman <CWH@LM1.NSC.Dialnet.Symbolics.COM>
Subject: Type hierarchy
To: Masinter.pa@Xerox.COM, CL-Characters@SAIL.Stanford.EDU
cc: Shiota@LM1.NSC.Dialnet.Symbolics.COM
In-Reply-To: <871222-060112-6764@Xerox>
Message-ID: <871229233714.3.CWH@LM2.NSC.Dialnet.Symbolics.COM>

    Date: 22 Dec 87 05:59 PST
    From: Masinter.pa@Xerox.COM

    I think it is a fundamental error, an unacceptable incompatible change, to
    change the Common Lisp type STRING to be something other than (VECTOR
    STRING-CHAR), as is suggested by all of the extant proposals.

Why do you feel that this is a fundamental error?  In the Symbolics Genera 7.1
implementation, the type STRING is the same as (OR (VECTOR STRING-CHAR) (VECTOR
CHARACTER)).  As far as I can tell, this hasn't caused a major compatibility
problem.  The CL programs I've seen which use strings have all run in the
Symbolics implementation without modification.

The Symbolics implementation returns the following results:

(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER) '(VECTOR STRING-CHAR)) -> NIL
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR) '(VECTOR CHARACTER)) -> NIL
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER) 'STRING)               -> T
(STRINGP (MAKE-ARRAY 1 :ELEMENT-TYPE 'CHARACTER))                     -> T
(TYPEP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR) 'STRING)             -> T
(STRINGP (MAKE-ARRAY 1 :ELEMENT-TYPE 'STRING-CHAR))                   -> T

MAKE-ARRAY ELEMENT-TYPE 'STRING-CHAR returns an array which allocates 8 bits
per character.  MAKE-ARRAY ELEMENT-TYPE 'CHARACTER returns an array which
allocates 28 bits per character (16 bits of code, 8 bits of font, and 4 bits of
modifier).

I believe that the current plan is to change MAKE-ARRAY ELEMENT-TYPE
'STRING-CHAR to return an array which allocates 16 bits per character (for 16
bits of code) and to use MAKE-ARRAY ELEMENT-TYPE 'STANDARD-CHAR to do what is
currently done with MAKE-ARRAY ELEMENT-TYPE 'STRING-CHAR.

Incidentally, I haven't heard any discussion of Moon's proposal that we simply
use the type STANDARD-CHAR to mean "lowest overhead character storage class"
rather than introducing a new type THIN-CHAR or INTERNAL-THIN-CHAR.

    Currently, you can assume after (TYPEP x 'STRING) that X can hold any
    STRING-CHAR element. Allowing STRING to denote several different types of vector
    whose element types are < STRING-CHAR would violate that assumption.

Why not just declare that assumption obsolete, and replace it with the
assumption that if (TYPEP X '(VECTOR STRING-CHAR)) then X can hold any
STRING-CHAR element.  Can you give me some examples of code which make use of
your assumption?

    It isn't necessary to change STRING in an incompatible way, however. What is
    really the intent of these proposals is to extend the various functions in CL
    that currently take "STRING" to also allow them to take other types as well.

That is only part of the intent.  It is also important that the following
forms return T.  (Assume that # represents a Japanese character.)

  (STRINGP "#")
  (TYPEP "#" 'STRING)
  (TYPEP (CHAR "#" 0) 'STRING-CHAR)

If the above forms do not return T, then many CL programs originally written to
handle only standard characters will not work when running in an environment
which has Japanese characters.  A major goal of this proposal is to allow these
programs to run without modification.  I can show you many programs which
require that the above forms return T.

    Suppose we define a new type

    (defun character-vector-p (x) 
       (and (vectorp x) (subtypep (array-element-type x) 'string-char)))

    (deftype character-vector () '(satisfies character-vector-p))..

    Now extend all functions that take strings as input arguments and have them
    accept any kind of character-vector. 

If you replace STRING-CHAR in your example with CHARACTER, then this is exactly
the same as what Symbolics has already done with the STRINGP function and the
STRING data type.


∂06-Jan-88  2217	CL-Characters-mailer 	Re: Type hierarchy
Received: from XEROX.COM by SAIL.STANFORD.EDU with TCP; 6 Jan 88  22:17:01 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 06 JAN 88 22:17:43 PST
Date: 6 Jan 88 22:16 PST
From: Masinter.pa@Xerox.COM
Subject: Re: Type hierarchy
In-reply-to: Carl Hoffman <CWH@LM1.NSC.Dialnet.Symbolics.COM>'s message of Tue,
 29 Dec 87 23:37 JST
To: CWH@LM1.NSC.Dialnet.Symbolics.COM
cc: Masinter.pa@Xerox.COM, CL-Characters@SAIL.Stanford.EDU,
 Shiota@LM1.NSC.Dialnet.Symbolics.COM
Message-ID: <880106-221743-6432@Xerox>

I've composed several replies and not sent them. My time is getting tight so I
have to send something. The problem is, can you have something that is a string
for which it is illegal to store a string-char into it?  No, in SCL. But if you
allow (vector standard-char) to also be a subtype of string, then you can have
vectors that can only hold standard-char and not string-char.

However, on even further reflection, there are many "read-only" strings, e.g.,
strings as program constants, for which it is an error to store *anything*. 

If we remove char-bits and char-font, we can get rid of the distinction between
string-char and character. This would be an improvement.

Most of the stuff in CLtL about the string type can in fact simply be removed,
while simplifying the language.

∂07-Jan-88  2036	CL-Characters-mailer 	X3J13 meeting in March 
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88  20:36:20 PST
Date: Wed, 06 Jan 88 09:58:06 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880106.095806.baggins@IBM.com>
Subject: X3J13 meeting in March

  I have arranged for our subcommittee to meet at the IBM Almaden
Research Centre on 14,15,18 March.  Please let me know if this
poses any difficulties.  Also, please let me know if your travel
arrangements or other commitments prevent your attending all or
part.

  ARC is south of Palo Alto, roughly a 40 to 50 min commute.
I would suggest our meetings begin at 10am to allow missing most
of the morning freeway congestion.  I'll provide more detailed
directions later.

Regards,
  Thom

∂07-Jan-88  2036	CL-Characters-mailer 	subcommittee mailing list   
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88  20:36:05 PST
Date: Tue, 05 Jan 88 21:42:40 PST
From: Thom Linden <baggins@ibm.com>
To: "Richard P. Gabriel" <rpg@sail.stanford.edu>
cc: "Dr. Takayasu Ito" <ito%ito.aoba.tohoku.junet@relay.cs.net>,
    "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880105.214240.baggins@IBM.com>
Subject: subcommittee mailing list

Dick,
  Please add the following individuals to the character subcommittee
mailing list:

  Yuasa:     yuasa@tutics.tut.junet
  Umemura:   umemura@nuesun.NTT.junet
  Kurokawa:  KUROKAWA%jpntscvm.bitnet%wiscvm.wisc.edu
  Yasumura:  yasumura@harl86.harl.hitachi.junet

Regards,
  Thom

∂07-Jan-88  2036	CL-Characters-mailer 	Comments on IBM Proposal from Dave Unitas (LUCID)    
Received: from IBM.COM by SAIL.STANFORD.EDU with TCP; 7 Jan 88  20:36:41 PST
Date: Wed, 06 Jan 88 12:50:38 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880106.125038.baggins@IBM.com>
Subject: Comments on IBM Proposal from Dave Unitas (LUCID)

  I have attached some comments on the proposal compiled by
Dave Unitas at LUCID.

  Both A and C seem to be good suggestions.


--------------------------------------------------------------------------------

A. Each character set is identified by its Character Set Name, a symbol,
   and an associated Character Set Number, a positive integer. (Replace
   CSID by Character Set Name or Character Set Number throughout the
   document).

   Replace char-split and char-join with:

     char-code-point char-code

   takes a character code and returns the component code-point.

     char-code-set char-code

   takes a character code and returns the component character set.

     make-char-code code-point &optional (character-set 0)

   takes a code-point and an optional character set and returns the
   character code.  The character set may be specified either as a
   Character Set Name of Character Set Number.


   Rename define-char-set to be define-character-set.  Make the arguments
   keywords rather than positionals.  If character-set-number is not
   specified, it is assigned from an available character set number
   below character-set-limit.

   Note:  Lucid as a whole is as yet undecided about whether user-
   defined character sets are generally useful enough to need to be
   included in the language.

B. We are still unsure about whether the type system should be extended
   to include extended strings of a particular character set or sets.

C. When printing an exted character set to a stream which only accepts
   base characters, it is printed in the form

   #\name:xxxx

   where name identifies the character set of the character, and xxxx
   is the code-point of the character in hex.  Strings containing
   extended characters are printed in the following form when written
   to a base-character only stream:

   #( char0 char1 char2 ...)

   with charn as above, following the standard Common Lisp vector
   printing convention.

∂10-Jan-88  0010	CL-Characters-mailer 	Re: Comments on IBM Proposal from Dave Unitas (LUCID)
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 10 Jan 88  00:10:01 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 10 JAN 88 00:10:39 PST
Date: 10 Jan 88 00:09 PST
From: Masinter.pa@Xerox.COM
Subject: Re: Comments on IBM Proposal from Dave Unitas (LUCID)
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Wed, 06 Jan 88 12:50:38
 PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880110-001039-3219@Xerox>

I think one of the problems with the discussion so far is that we've not agreed
really on the fundamental issue of whether the standard is for an optional
extension or for a required part of the standard.

For the record, I think that we should be designing things that are a required
part of every Common Lisp implementation. That is, every function, variable,
etc. in our standard should be in every Common Lisp implementation.  In some
implementations, the characters they work on are of course only 7 or 8-bit
ASCII, but all of the functions are there, and if the implementation has more
characters or Japanese characters, the same code will work.

If this is a required part of Common Lisp, we should try to keep to a minimum
the number of new functions, variables, and behaviors we expect from a Common
Lisp implementation. 

I don't think that the introduction of new functions and variables for dealing
with character sets really fits that criteria. The only situation where allowing
exposure to multiple character sets within a single implementation makes sense
is one in which the host operating system does not contain facilities to do
character set translation, and yet the programmer is unwilling (using binary
read-byte write-byte) to do that character set translation directly. This seems
like an extremely narrow application domain for the dozens of functions and
variables which exist in the IBM proposal.

= = = = 

As a side note, the IBM proposal contains a fairly serious design flaw: the
Common Lisp design is generally careful to avoid having dynamically modifiable
global state that isn't rebindable; e.g., although you can change macro
characters, all changes happen to *readtable*, etc.  Yet the character code
equivalency tables in the IBM proposal are global and not yet bindable. Even if
this isn't part of the standard but an internal library for you, you should fix
it.

= = = = = =
About the type system: the discussion on Common-LIsp@sail.stanford.edu on array
element type upgrading is relevant to the type hierarchy here. Suppose arrays
remember their element type. Redefine (stringp x) = (and (vectorp x) (subtypep
(array-element-type x) 'character)). 

If you want to make a string that consists of only (capital) vowels,, you can
say
(make-array 10 :element-type '(member #\A #\E #\I #\O #\U)).



= = = = = = =
Re: "C. When printing an exted character set to a stream which only accepts
   base characters, it is printed in the form ... Strings containing
   extended characters are printed in the following form when written
   to a base-character only stream ..."

how are symbols that contain extended characters printed?

What happens when you call PRINC (which is supposed to not include the #\)?

I think this is a bad design. If you want to write extended characters on a base
stream, you should design a character-by-character encoding with escape
characters, and have the write-char primitive for the base stream turn the
extended characters (and the excape) into an escaped character sequence. These
alternative print sequences only handle a small percentage of the situations.





∂22-Jan-88  0005	CL-Characters-mailer 	Equivalence binding    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  00:05:45 PST
Date: Thu, 21 Jan 88 23:56:14 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880121.235614.baggins@IBM.com>
Subject: Equivalence binding

Larry's comment on the binding of equivalency tables is well taken.
Our view of the expected usage of these tables plus trying to keep
the proposed changes to a minimum argued against bindable tables.
Language consistency argues the other way.  The introduction of
an equivalencetable object and associated global *equivalencetable*
variable would make this more in line with the 'spirit' of CL.

∂22-Jan-88  0137	CL-Characters-mailer 	redefining STANDARD-CHAR    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  01:37:17 PST
Date: Fri, 22 Jan 88 01:31:08 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.013108.baggins@IBM.com>
Subject: redefining STANDARD-CHAR

  Carl's comment on STANDARD-CHAR == lowest overhead character
storage class is precisely what 'base-character' was defined to
be.  The rational for STANDARD-CHAR being the small set of 96
glyphs is based on portability. Programs constrained to the
limited set are likely to be portable across a larger range of
systems and architectures.  While this is probably true (can
anyone testify to this?), it may not warrant a unique type.

  Other languages typically define a set of 'standard' characters
used for the construction of programs.  Does anyone know of a language
other than Lisp which equates this set with a unique type?

  I think distinguishing this 'lowest overhead storage class'
type is essential.  This must be made for efficiency reasons.
It's unacceptable to force the use of 16bit cells for all
characters in multi-lingual environments.

∂22-Jan-88  0202	CL-Characters-mailer 	Type Hierarchies  
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  02:01:50 PST
Date: Fri, 22 Jan 88 01:52:46 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.015246.baggins@IBM.com>
Subject: Type Hierarchies

No one has mentioned Bob Kern's document.  The type hierarchies
in the JEIDA, IBM and Kern documents are essentially identical
(excepting thin vs. base, fat vs. extended, and Bob's user-extensions).

  Bob makes a valid point that the two-byte encodings may make way
for three, etc. later.  But, it seems best to hide that from the
language as much as possible.  I suggest that extended would always
mean the 'largest overhead character storage class'.

∂22-Jan-88  0224	CL-Characters-mailer 	Font    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  02:24:34 PST
Date: Fri, 22 Jan 88 02:20:47 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.022047.baggins@IBM.com>
Subject: Font

  Bob Kerns paper contains a set of changes to eliminate char-font
and allows for some migratory behavior.  I think the [migration]
aids not be made part of the standard but be suggestions as bridges
an implementation may provide.  I would like to get a straw vote
over the network as to everyone else's opinion?

   In summary:  (I, not Bob, marked items [migration])


      13.1 Character Attributes

          {eliminate references to font}


[migration]   char-font-limit
                   The value of char-font-limit is 1, unless the
                   implementation implements the obsolete char-font
                   feature.

      13.2 Predicates on Characters

          {eliminate references to font}

      13.3 Character Construction and Selection

          {eliminate references to font}

[migration]   char-font
                   This function is obsolete, and returns 0 for
                   compatibility.

[migration]   make-char char &optional (bits o) (font o)
                   (font o) exists for compatibility.

      13.3 Character Construction and Selection

          {eliminate references to font}

[migration]   digit-char weight &optional (radix 10) (font o)
                   (font o) exists for compatibility.


∂22-Jan-88  0234	CL-Characters-mailer 	character set predicates    
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 22 Jan 88  02:34:12 PST
Date: Fri, 22 Jan 88 02:28:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880122.022818.baggins@IBM.com>
Subject: character set predicates

Larry suggested to me that we not try to invent
correct set of xxx-char-p's
eg. kanji-char-p, hiragana-char-p, greek-char-p  ..  etc. but
instead use the names listed in the ISO std character sets.
This sounds like a good idea  ..  now we only have to find the
list.  In fact, I imagine we can reference the ISO std without
having to incorporate the list into ANSI.


∂26-Jan-88  1928	CL-Characters-mailer 	Font    
Received: from REAGAN.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 26 Jan 88  19:28:16 PST
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 88658; Tue 26-Jan-88 22:28:00 EST
Date: Tue, 26 Jan 88 22:27 EST
From: Robert W. Kerns <RWK@AI.AI.MIT.EDU>
Subject: Font
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
In-Reply-To: <880122.022047.baggins@IBM.com>
Message-ID: <880126222758.5.RWK@JONES.AI.MIT.EDU>

    Date: Fri, 22 Jan 88 02:20:47 PST
    From: Thom Linden <baggins@ibm.com>

      Bob Kerns paper contains a set of changes to eliminate char-font
    and allows for some migratory behavior.  I think the [migration]
    aids not be made part of the standard but be suggestions as bridges
    an implementation may provide.  I would like to get a straw vote
    over the network as to everyone else's opinion?

This seems reasonable to me.  So far as anyone can tell, nobody
has ever implemented the Font field.  (I haven't checked with
Coral Software to see what they do on the Macintosh; that would seem
to me to be the place most likely to have done so.  I'll check with
them shortly.)

∂26-Jan-88  1942	CL-Characters-mailer 	Type Hierarchies  
Received: from REAGAN.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 26 Jan 88  19:42:10 PST
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 88661; Tue 26-Jan-88 22:42:01 EST
Date: Tue, 26 Jan 88 22:41 EST
From: Robert W. Kerns <RWK@AI.AI.MIT.EDU>
Subject: Type Hierarchies
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
In-Reply-To: <880122.015246.baggins@IBM.com>
Message-ID: <880126224159.6.RWK@JONES.AI.MIT.EDU>

    Date: Fri, 22 Jan 88 01:52:46 PST
    From: Thom Linden <baggins@ibm.com>

    No one has mentioned Bob Kern's document.  The type hierarchies
    in the JEIDA, IBM and Kern documents are essentially identical
    (excepting thin vs. base, fat vs. extended, and Bob's user-extensions).

      Bob makes a valid point that the two-byte encodings may make way
    for three, etc. later.  But, it seems best to hide that from the
    language as much as possible.  I suggest that extended would always
    mean the 'largest overhead character storage class'.

The issue here is:  What should existing code, written using STRING and
STRING-CHAR mean?  Should code written in the most general current
fashion continue to mean the most general thing?  Or should it mean the
most efficient?

The assumption behind my proposal is that it should mean the most general,
and if you want a more specific, but more space-efficient, type, you use
a new name.

So far as Symbolics is concerned, having STRING-CHAR mean a more specific
type would be LESS of a problem, since in the current Symbolics software,
STRING-CHAR means the 1-byte kind of characters.

The trade-off, in terms of users' code, would be:

1)  If STRING-CHAR is more general, users' code will get less efficient when
an implementation implements the new standard, but will work for all input.

2)  If STRING-CHAR is more specific, users' code will retain their efficiency,
but may no longer work for the entire range of input found, say, in files or
other strings.

Whether case 2 would be viewed as an incompatibility or not depends on the
exact contract for the code in question.  For example, a file copy or string
utility would definitely be regarded as having been broken by the change,
while other code might be regarded as just not taking advantage of a new feature.

By the way, I should make my position in this clear.  I am no longer
affiliated with Symbolics.  While my opinions and views are probably
indicative of views there, and I have some influence and many contacts
there, I have no official connection, and my views are my own.  I
continue to be concerned with conventional as well as specialized
architectures, though.

∂04-Feb-88  0020	CL-Characters-mailer 	Forwarding note from Ito-san
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 4 Feb 88  00:19:52 PST
Date: Wed, 03 Feb 88 10:49:00 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880203.104900.baggins@IBM.com>
Subject: Forwarding note from Ito-san



-------------------------------------------------------------


To: Thom Linden, Chairman of character subcommittee, Common Lisp, ANSI
From: Takayasu Ito, Chairman of Japanese SC22/Lisp WG and JEIDA
      committee on Lisp standardization

Subject: Comments on IBM Proposal "Common LISP - Proposed Extensions
        for International Character Set Handling" (Version 01.11.87)

We have received the proposal through Mr. T. Kurokawa, P-member of our
committee. Here is the summary of our comments compiled by him and
Dr. T. Yuasa. (More details may be obtained from them.)

1. Overall impression

We think this is an interesting proposal for initiating extensive
investigation about international character set handling. We need,
however, to continue to work on many aspects on this area.

2. We have had several meetings on this subject. The following is a list
of comments presented at these occasions.

Please notify that these are not yet our committee's formal statement.
-- The locality of 'equivalence class' must be maintained as suggested
by Larry Masinter. A variable such as *equivalence-class* would do.
-- It is important to define the 'base' character set.
It is still under hot dispute, but one argues, for example, that
the base should be clearly defined as single-byte, and the extension
should be defined by each national standardization body.
Another says, in Japan, it
should be two-byte size or its maximum should be around 64K. We should
leave the actual implementation of the character to be
implementation-dependent so that US or Europe can enjoy the efficient
implementation of single byte size.
-- The implementation of 'equivalence class table' will be the key for
the efficiency.
-- The relationship between 'equivalence class' and 'readtable' or
'character macro' should be investigated further. We may be able to
reduce the primitives around these character input facilities.
-- The proposal may not well be abstracted. For those who have enough
experience on Common Lisp implementation, the document has so much
reflected from real (perhaps trial) implementation.
For example, csid for base is defined as '0' for US implementation.

3. Cooperation should be continued.
We regard that our cooperation for international character set handling
is indispensable and fruitful. We would like to continue to exchange our
ideas on this subject.


∂04-Feb-88  1732	CL-Characters-mailer 	Re: X3J13 meeting in March  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 4 Feb 88  17:32:05 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 04 FEB 88 17:32:11 PST
Date: 4 Feb 88 17:32 PST
From: Masinter.pa@Xerox.COM
Subject: Re: X3J13 meeting in March
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Wed, 06 Jan 88 09:58:06
 PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880204-173211-1471@Xerox>

Thom:

For all of those who are staying in Palo Alto for the duration of the meeting,
adding the 40-50 minute commute each way (for a total of 4.5 hours of commute
time) seems to be a considerable imposition.  

It would seem to pose much fewer difficulties for almost all of the subcommittee
members to hold the meetings in Palo Alto, since that is where the X3J13 meeting
is being held. 

Jan Zubkoff has offered to arrange meeting rooms in Palo Alto for subcommittee
meetings; why not take her up on the offer?

I've been on the road and just returned; I'm sorry for my late reply to this
message. 
It is likely that the cleanup committee will meet on Tuesday morning 15 March,
which would interfere with my attending a meeting in Palo Alto until 1 PM and at
ARC until 2 PM.


∂08-Feb-88  1143	CL-Characters-mailer 	subcommittee meeting   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 8 Feb 88  11:39:33 PST
Date: Mon, 08 Feb 88 11:22:20 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880208.112220.baggins@IBM.com>
Subject: subcommittee meeting

Larry has expressed interest in holding the subcommittee meetings
in Palo Alto to ease the commute.  What is the feeling of the
rest of the committee?  Please answer the following short
questionaire:

   I am planning on attending the March meeting:  YES/NO
   Subcommittee meeting at Almaden (San Jose) is OK:  YES/NO/DONTCARE
   I will be available to attend subcommittee meetings from:

                            Date               Hours

                          14 Mar               9-4pm
                          15 Mar               9-4pm
                          18 Mar               9-4pm

Please respond by 11 Feb so I can make alternate arrangements if
necessary.

Regards,
  Thom

∂08-Feb-88  1818	CL-Characters-mailer 	subcommittee meeting   
Received: from AI.AI.MIT.EDU by SAIL.Stanford.EDU with TCP; 8 Feb 88  18:17:51 PST
Date: Mon,  8 Feb 88 21:18:15 EST
From: "Robert W. Kerns" <RWK@AI.AI.MIT.EDU>
Subject:  subcommittee meeting
To: baggins@IBM.COM
cc: cl-characters@SAIL.STANFORD.EDU
In-reply-to: Msg of Mon 08 Feb 88 11:22:20 PST from Thom Linden <baggins at ibm.com>
Message-ID: <323585.880208.RWK@AI.AI.MIT.EDU>

    Date: Mon, 08 Feb 88 11:22:20 PST
    From: Thom Linden <baggins at ibm.com>
    To:   X3J13: Character Subcommittee <cl-characters at sail.stanford.edu>
    Re:   subcommittee meeting

    Larry has expressed interest in holding the subcommittee meetings
    in Palo Alto to ease the commute.  What is the feeling of the
    rest of the committee?  Please answer the following short
    questionaire:

       I am planning on attending the March meeting:  YES/NO
YES
       Subcommittee meeting at Almaden (San Jose) is OK:  YES/NO/DONTCARE
I would prefer Palo Alto.  I can handle San Jose; I do have
friends there I intend to visit, but Palo Alto would leave me
more flexibility.
       I will be available to attend subcommittee meetings from:

                                Date               Hours

                              14 Mar               9-4pm
                              15 Mar               9-4pm
                              18 Mar               9-4pm
Yes, so far as I know, but please, let's not consume 100%
of all three days!  I'll be suspicious of any work we do at that pace.

    Please respond by 11 Feb so I can make alternate arrangements if
    necessary.

    Regards,
      Thom

∂12-Feb-88  0620	CL-Characters-mailer 	Font    
Received: from XX.LCS.MIT.EDU by SAIL.Stanford.EDU with TCP; 12 Feb 88  06:20:19 PST
Received: from LIVE-OAK.LCS.MIT.EDU by XX.LCS.MIT.EDU via Chaosnet; 12 Feb 88 09:17-EST
Received: from ACORN.Gold-Hill.DialNet.Symbolics.COM by MIT-LIVE-OAK.DialNet.Symbolics.COM via DIAL with SMTP id 80237; 12 Feb 88 09:18:27-EST
Received: from BOSTON.Gold-Hill.DialNet.Symbolics.COM by ACORN.Gold-Hill.DialNet.Symbolics.COM via CHAOS with CHAOS-MAIL id 93908; Thu 11-Feb-88 05:30:59-EST
Date: Fri, 12 Feb 88 08:32 est
From: mike%acorn@oak.lcs.mit.edu
To: RWK@AI.AI.MIT.EDU
Subject: Font
Cc: baggins@ibm.com, cl-characters@sail.stanford.edu

          Bob Kerns paper contains a set of changes to eliminate char-font
        and allows for some migratory behavior.  I think the [migration]
        aids not be made part of the standard but be suggestions as bridges
        an implementation may provide.  I would like to get a straw vote
        over the network as to everyone else's opinion?
    
    This seems reasonable to me.  So far as anyone can tell, nobody
    has ever implemented the Font field.  


We don't implement font either. I think char-font should be dropped.
char-bits is more of a problem, but I think it should be dropped too.
for compatibility we should introduce a "non-standard" migration path
type called "keychord" to represent objects like #\c-m-s-h-S, etc.
The confusion between characters as codepoints in an implicit or 
explicit character set, and keyboard key combinations is one which is
incredibly useless and should go away. It is particularly troublesome
when you consider a japanese keyboard sequence, where you need 
several keyboard and keychord hits to generate a character, and
bits doesn't correspond to keychords or "shifting" in any reasonable
way.



...mike beckerle
Gold Hill

    


∂16-Feb-88  1112	CL-Characters-mailer 	March meeting
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88  11:12:09 PST
Date: Tue, 16 Feb 88 10:04:31 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.100431.baggins@IBM.com>
Subject: March meeting

  In general, folks wanted to meet in the PA area.  I have requested
a meeting room for Monday 14 Mar 1-4pm and Tuesday 25 Mar 9-4pm.  I'll
relay the confirmation as soon as I have it.

Regards,
  Thom

∂16-Feb-88  1506	CL-Characters-mailer 	bits and charsets 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88  15:06:37 PST
Date: Tue, 16 Feb 88 12:39:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.123918.baggins@IBM.com>
Subject: bits and charsets

    We don't implement font either. I think char-font should be dropped.
    char-bits is more of a problem, but I think it should be dropped too.
    for compatibility we should introduce a "non-standard" migration path
    type called "keychord" to represent objects like #\c-m-s-h-S, etc.
    The confusion between characters as codepoints in an implicit or
    explicit character set, and keyboard key combinations is one which is
    incredibly useless and should go away. It is particularly troublesome
    when you consider a japanese keyboard sequence, where you need
    several keyboard and keychord hits to generate a character, and
    bits doesn't correspond to keychords or "shifting" in any reasonable
    way.


  In thinking about Mike's note, it occurs to me that explicit support
for character sets actually encompasses bits.  An implementation
could support a character set named 'meta-cyrillic' for example, this
could contain all the cyrillic character combinations of alt,ctl, etc..
and would be distinct from the non-distinguished cyrillic characters.
Similarily this could apply to any conventional character set an
implementation would choose to support.

∂16-Feb-88  1543	CL-Characters-mailer 	bits and charsets 
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 16 Feb 88  15:42:50 PST
Date: Tue, 16 Feb 88 12:39:18 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880216.123918.baggins@IBM.com>
Subject: bits and charsets

    We don't implement font either. I think char-font should be dropped.
    char-bits is more of a problem, but I think it should be dropped too.
    for compatibility we should introduce a "non-standard" migration path
    type called "keychord" to represent objects like #\c-m-s-h-S, etc.
    The confusion between characters as codepoints in an implicit or
    explicit character set, and keyboard key combinations is one which is
    incredibly useless and should go away. It is particularly troublesome
    when you consider a japanese keyboard sequence, where you need
    several keyboard and keychord hits to generate a character, and
    bits doesn't correspond to keychords or "shifting" in any reasonable
    way.


  In thinking about Mike's note, it occurs to me that explicit support
for character sets actually encompasses bits.  An implementation
could support a character set named 'meta-cyrillic' for example, this
could contain all the cyrillic character combinations of alt,ctl, etc..
and would be distinct from the non-distinguished cyrillic characters.
Similarily this could apply to any conventional character set an
implementation would choose to support.

∂19-Feb-88  1434	CL-Characters-mailer 	Re: bits and charsets  
Received: from Xerox.COM by SAIL.Stanford.EDU with TCP; 19 Feb 88  14:34:18 PST
Received: from Cabernet.ms by ArpaGateway.ms ; 19 FEB 88 14:24:22 PST
Date: 19 Feb 88 14:24 PST
From: Masinter.pa@Xerox.COM
Subject: Re: bits and charsets
In-reply-to: Thom Linden <baggins@ibm.com>'s message of Tue, 16 Feb 88 12:39:18
 PST
To: baggins@ibm.com
cc: cl-characters@sail.stanford.edu
Message-ID: <880219-142422-9739@Xerox>

Well, the most natural embedding of "bits" is just directly within the character
code space, with or without the character code equivalence space.

On the subject of character sets, I've thought of the following problem with any
kind of dynamic adjustment of character equivalence tables: hash tables which
hash by string-equal won't work if string-equal might depend either on some
dynamically changable state or even a bindable state.


∂29-Feb-88  1304	CL-Characters-mailer 	subcommittee meeting   
Received: from IBM.COM by SAIL.Stanford.EDU with TCP; 29 Feb 88  13:04:37 PST
Date: Mon, 29 Feb 88 12:29:52 PST
From: Thom Linden <baggins@ibm.com>
To: "X3J13: Character Subcommittee" <cl-characters@sail.stanford.edu>
Message-ID: <880229.122952.baggins@IBM.com>
Subject: subcommittee meeting

The characters subcommittee will meet from 9am-5pm on both
Monday, 14 Mar, and Tuesday, 15 Mar, in the Hyatt Delmonte room.

Regards,
  Thom